Textual Spatial Cosine Similarity

نویسنده

  • Giancarlo Crocetti
چکیده

When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in the real time feeding of content as in enterprisewide search environments. To address these real-time constraints, we developed a new measure of document similarity called Textual Spatial Cosine Similarity, which is able to detect similitude at the semantic level using word placement information contained in the document. We will see in this paper that two degenerate cases exist for this model, which coincide with Cosine Similarity on one side and with a paraphrasing detection model to the other.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SimBow at SemEval-2017 Task 3: Soft-Cosine Semantic Similarity between Questions for Community Question Answering

This paper describes the SimBow system submitted at SemEval2017-Task3, for the question-question similarity subtask B. The proposed approach is a supervised combination of different unsupervised textual similarities. These textual similarities rely on the introduction of a relation matrix in the classical cosine similarity between bag-of-words, so as to get a softcosine that takes into account ...

متن کامل

SAGAN: An approach to Semantic Textual Similarity based on Textual Entailment

In this paper we report the results obtained in the Semantic Textual Similarity (STS) task, with a system primarily developed for textual entailment. Our results are quite promising, getting a run ranked 39 in the official results with overall Pearson, and ranking 29 with the Mean metric.

متن کامل

Two Approaches to Handling Noisy Variation in Text Mining

Variation and noise in textual database entries can prevent text mining algorithms from discovering important regularities. We present two novel methods to cope with this problem: (1) an adaptive approach to “hardening” noisy databases by identifying duplicate records, and (2) mining “soft” association rules. For identifying approximately duplicate records, we present a domain-independent two-l...

متن کامل

Similarity of Medical Cases in Health Care Using Cosine Similarity and Ontology

The increasing use of digital patient records in hospital saves time and reduces risks of wrong treatments caused by lack of information. Digital patient records also enable efficient spread and transfer of experience gained from diagnosis and treatment of individual patient which is now-a-days mostly manual (speaking with colleagues) and rarely aided by computerized system. Most of the content...

متن کامل

Hybrid self-optimized clustering model based on citation links and textual features to detect research topics

The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features on similarity computation, the authors propose a hybrid self-optimized clustering model to detect ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1505.03934  شماره 

صفحات  -

تاریخ انتشار 2014